User-Oriented MT Evaluation and Text Typology
نویسنده
چکیده
A brief survey of user-oriented MT evaluation methodologies suggests that they all suffer drawbacks in terms of time and/or generality and/or user interpretability. The time and cost of evaluation, coupled with system prices, make it increasingly less likely that full-scale pre-purchase evaluation by a single potential user will be economic. It is clearly desirable that MT systems should be “generically benchmarked” in some way to agreed standards. Regardless of the particular evaluation metrics used, generic benchmarking requires corpora of typed texts as test material, presupposing a shared notion of text type. Document structure, which is increasingly subject to international standardisation, provides a useful basis for text typing in this context. Furthermore, elements of a given document structure can be associated with linguistic diagnostics usable in statistical measures of prototypicality. 1 User-Oriented Evaluation Potential users of an MT system want to know whether it is likely to be an economical factor in the translation of their texts. Test methodologies that we know about use either prototypical samples of end-user texts or artificial texts constructed by designers and developers, such as a suite of test sentences designed to exhibit linguistic constructions on a “one-per-sent” basis (Flickinger, 1987). From an end-user perspective, artificial texts are not necessarily the ideal basis for system evaluation: • For a linguistically naive user, specifying that System X translates gapping constructions successfully is not informative. Even if s/he is shown an example of a gapping construction, it is not immediately obvious what strings are to count as gapping constructions.
منابع مشابه
Blast: A Tool for Error Analysis of Machine Translation Output
We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part of MT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical u...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملSECTra_w.1: an Online Collaborative System for Evaluating, Post-editing and Presenting MT Translation Corpora
SECTra_w.1 is a web-oriented system mainly dedicated to the evaluation of MT systems. After importing a source corpus, and possibly reference translations, one can call various MT systems, store their results, and have a collection of human judges perform subjective evaluation online (fluidity, adequacy). It is also possible to perform objective, task-oriented evaluation by letting humans post-...
متن کاملEvaluation of NLG: Some Analogies and Differences with Machine Translation and Reference Resolution
This short paper first outlines an explanatory model that contrasts the evaluation of systems for which human language appears in their input with systems for which language appears in their output, or in both input and output. The paper then compares metrics for NLG evaluation with those applied to MT systems, and then with the case of reference resolution, which is the reverse task of generat...
متن کاملDemo of iMAG Possibilities: MT-postediting, Translation Quality Evaluation, Parallel Corpus Production
An interactive Multilingual Access Gateway (iMAG) dedicated to a web site S (iMAG-S) is a good tool to make S accessible in many languages immediately and without editorial responsibility. Visitors of S as well as paid or unpaid post-editors and moderators contribute to the continuous and incremental improvement of the most important textual segments, and eventually of all. Pre-translations are...
متن کامل